Automatic Idiom Identification Model for Amharic Language

نویسندگان

چکیده

Idiomatic expressions are important natural parts of all languages and prominent our daily speech. Idioms cannot be interpreted from the word that they formed directly people may not understand meaning. From past literature, it was noted idiom affects Natural Language Processing research like machine translation, semantic analysis, sentiment analysis. Other English, Chinese, Indian idioms recognized through different methods in research. As there is no standard method to identify Amharic idioms, this study aimed build a model for language using supervised learning approach. The used eight hundred labeled training two testing books “የአማ ረኛ ፈሊጦች” documents. To measure performance model, we accuracy, precision, recall, F-score. Finally, 97.5% accuracy result achieved dataset showing promising result. contributes information systems discourse about improving awareness, knowledge researchers on idioms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Idiom Identification in Wiktionary

Online resources, such as Wiktionary, provide an accurate but incomplete source of idiomatic phrases. In this paper, we study the problem of automatically identifying idiomatic dictionary entries with such resources. We train an idiom classifier on a newly gathered corpus of over 60,000 Wiktionary multi-word definitions, incorporating features that model whether phrase meanings are constructed ...

متن کامل

Automatic speech recognition for an under-resourced language - amharic

In this paper we present the development of an Automatic Speech Recognition System (ASRS) for Amharic using limited available resources and the freely available speech toolkit (HTK). There are phonological, dialectal, orthographic and morphological features of Amharic that challenge the development of ASRSs. The problem of resource scarcity is also a hindrance to the research and development in...

متن کامل

Sub-word Based Language Modeling for Amharic

This paper presents sub-word based language models for Amharic, a morphologically rich and under-resourced language. The language models have been developed (using an open source language modeling toolkit SRILM) with different n-gram order (2 to 5) and smoothing techniques. Among the developed models, the best performing one is a 5gram model with modified Kneser-Ney smoothing and with interpola...

متن کامل

Automatic language identification

Automatic language identification is the process by which the language of a digitized speech utterance is recognized by a computer. In this paper, we will describe the set of available cues for language identification and discuss the different approaches to building working systems. This overview includes a range of historic approaches, contemporary systems that have been evaluated on standard ...

متن کامل

Morpheme-based automatic speech recognition for a morphologically rich language - Amharic

Out-of-vocabulary (OOV) words are a major source of error in a speech recognition system and various methods have been proposed to increase the performance of the systems by properly dealing with them. This paper presents an automatic speech recognition experiment conducted to see the effect of OOV words on the performance speech recognition system for Amharic (a morphologically rich language)....

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing

سال: 2023

ISSN: ['2375-4699', '2375-4702']

DOI: https://doi.org/10.1145/3606864